Methods for establishing and utilizing sensorimotor programs

ABSTRACT

A method for establishing sensorimotor programs includes specifying a concept relationship that relates a first concept to a second concept and establishes the second concept as higher-order than the first concept; training a first sensorimotor program to accomplish the first concept using a set of primitive actions; and training a second sensorimotor program to accomplish the second concept using the first sensorimotor program and the set of primitive actions.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/043,146, filed on 23 Jul. 2018, which claims the benefit of U.S.Provisional Application Ser. No. 62/535,703, filed on 21 Jul. 2017,which is incorporated in its entirety by this reference.

TECHNICAL FIELD

This invention relates generally to the artificial intelligence field,and more specifically to new and useful methods for establishing andutilizing sensorimotor programs.

BACKGROUND

While computer vision remains a complex problem in artificialintelligence, recent achievements such as the recursive cortical network(RCN) have enabled computers to identify objects from visual dataefficiently and with high accuracy. However, just as with human vision,object recognition is only a part of the skillset needed to effectivelyinteract with an environment. Humans may observe how objects interactwith each other to infer properties of those objects; for example, byobserving how a sphere reacts when dropped onto a hard surface, a humanmay be able to infer whether a ball is made of rubber, cork, or steel.Often, this observation occurs as a result of direct interaction withthe environment; e.g., a human intentionally drops a ball onto a hardsurface (or squeezes the ball, etc.) as an alternative to passivelywaiting for the environment to produce such a situation naturally. Thisknowledge makes it easier to accurately interpret past events, andlikewise, to predict future events. Unfortunately, traditionalapproaches to computer vision more often embody the approach of thepassive observer, which restricts their ability to achieve comprehensionof an environment in a complete and generalizable sense. Thus, there isa need in the artificial intelligence field to create new and usefulmethods for establishing and utilizing sensorimotor programs. Thisinvention provides such new and useful methods.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a chart representation of a method of an invention embodiment;and

FIG. 2 is a chart representation of a concept hierarchy of a method ofan invention embodiment.

DESCRIPTION OF THE INVENTION EMBODIMENTS

The following description of the invention embodiments of the inventionis not intended to limit the invention to these invention embodiments,but rather to enable any person skilled in the art to make and use thisinvention.

1. Method for Establishing Sensorimotor Programs

A method 100 for establishing sensorimotor programs includes specifyinga concept relationship S120, training a first sensorimotor program S130and training a second sensorimotor program using the first sensorimotorprogram S140, as shown in FIG. 1. The method 100 may additionally oralternatively include generating a sensorimotor training curriculum S110and/or executing the first and second sensorimotor programs S150.

As discussed in the background section, traditional approaches tocomputer vision often focus on systems and methods that deriveinformation from their environments in a passive manner. For example, acomputer vision system for an autonomous vehicle might be trained todistinguish various objects based on their visual characteristics asobserved by a camera. While this approach to computer vision isstraightforward, it often suffers from two disadvantages. The first isthat sensory input is useful to distinguish objects or environmentalstates only to the extent that the sensory input differs substantiallybetween those objects and environmental states. For example, a computerthat identifies objects based on similarity of appearance may havetrouble distinguishing objects that appear to be similar; e.g., such acomputer may not be able to distinguish a person from a statue. Somecomputer vision approaches attempt to solve this issue by taking moredata (e.g., different sensor types or attempting to infer indirectlysensed information such as inferring physical properties of an objectfrom its movement), but these approaches have drawbacks as well. Thesecond issue is that this approach results in poor generalizability;e.g., it can be difficult to figure out how to treat a new detectedobject based solely on similarity to trained objects. For example, if arobot is trained to interact with a pen in a certain way, that may notaccurately inform the robot how to interact with a laser pointer (eventhough pens and laser pointers may look quite similar).

To address these problems, some researchers have turned toward models ofperception used to describe the behavior exhibited by naturalconsciousnesses (e.g., those of animals and humans). One such model isthe sensorimotor theory of perceptual consciousness. This theoryattempts to explain the perception of “feel” as arising from an agentengaging in a particular sensorimotor skill and attending to the factthat they are engaged in exercising that skill. It follows from thistheory that the quality of a sensation is based on the way an agentinteracts with its environment and not solely based on passiveobservation.

This reflects the real-world behavior of many animals when placed intonew environments. Often, animals interact with their environments inwhat at first may appear to be random ways. The result of their behaviorleads to environmental feedback that either reward or punish thebehavior; the link established between the feedback (e.g., sensory cues)and the behaviors (e.g., exploratory motor actions) may be referred toas a sensorimotor contingency. Over time, animals refine their behaviorto attain rewards while avoiding punishment (reinforcement learning).The process of establishing sensorimotor contingencies is part of“exploration”, and the use of them after establishment is known as“exploitation”.

Animals may utilize exploration and exploitation efficiently: whenencountering a new environment, animals explore the environment untilfinding a rewarding set of behaviors. Once a rewarding set of behaviorsis established, animals may continue to exploit the behaviors as long asthey continue being rewarding. If a set of behaviors ceases to berewarding, animals may then resume the process of exploration.

While reinforcement learning in general has been studied extensively inthe context of machine learning, applications utilizing sensorimotorcontingencies are far less common. Further, most of these applicationsutilize relatively simple reinforcement learning strategies (e.g.,learning solely via random exploration). This can limit the efficiencyand/or generalizability of these applications.

In contrast, the method 100 focuses on the establishment of sensorimotorprograms that build upon previously established sensorimotor programs tolearn new behavior. By utilizing existing sensorimotor programs in thetraining process, the method 100 may more quickly learn complexbehaviors than would otherwise be possible. Further, to the extent thatthe method 100 includes environment generation (a la S110), the method100 may additionally exploit this advantage by intentionally generatingenvironments in a manner reflecting the role of particular simpleconcepts in representing more complex concepts. Further, thesensorimotor programs generated by the method 100 may feature enhancedgeneralizability compared to traditional approaches thanks tohierarchical relationships between concepts (this may also be thought ofas an advantage for S150).

The method 100 is preferably implemented by a partially observableMarkov decision process (POMDP) operating on a neural network. Neuralnetworks and related systems, including recursive cortical networks(RCNs), convolutional neural networks (CNNs), hierarchical compositionalnetworks (HCNs), HMAX models, Slow Feature Analysis (SFA) systems, andHierarchical Temporal Memory (HTM) systems may be used for a widevariety of tasks that are difficult to complete using standardrule-based programming. These tasks include many in the important fieldsof computer vision and speech recognition.

Neural networks and related systems can be represented as distributedprocessing elements that implement summation, multiplication,exponentiation or other functions on the elements incomingmessages/signals. Such networks can be enabled and implemented through avariety of implementations. For example, a system operating the method100 may be implemented as a network of electronically coupled functionalnode components. The functional node components can be logical gatesarranged or configured in a processor to perform a specified function.As a second example, the system may be implemented as a network modelprogrammed or configured to be operative on a processor. The networkmodel is preferably electronically stored software that encodes theoperation and communication between nodes of the network. Neuralnetworks and related systems may be used in a wide variety ofapplications and can use a wide variety of data types as input such asimages, video, audio, natural language text, analytics data, widelydistributed sensor data, or other suitable forms of data.

As described previously, the method 100 enables both more efficientlearning and execution of machine learning tasks related toenvironmental perception, thus serving as a specific improvement tocomputer-related technology. The method 100 may enable morememory-efficient, faster, more generalizable, and more compactrepresentation of any automated computer-controlled system thatinteracts with its environment. The method 100 is not intended in anyform to cover an abstract idea and may not be performed without acomputing system.

The sensorimotor programs (SMPs) of the method 100 (also referred to assensorimotor contingencies) embody behaviors that can be used torepresent an agent's knowledge of the environment. Each sensorimotorprogram jointly represents one or more behaviors and an outcome. Eachsensorimotor program additionally is capable of signaling its outcome(enabling a high-level sensorimotor program to execute and act based onthe output of lower-level sensorimotor programs). The ability of SMPs togenerate outcome signals enables the outcome signals to be compared withglobal truth during training, and enable rewards to be based not only onwhether an SMP achieves a desired outcome but also on whether the SMPsignals that outcome; e.g., if the SMP is achieving an outcome but doesnot signal properly the reward can be structured differently than if itachieves an outcome and signals properly. This is not possible intraditional reinforcement learning systems.

Two examples of sensorimotor programs include classification SMPs andbring-about SMPs. Classification SMPs perform actions in the environmentto determine whether a concept is present in the environment or not. Forexample, a classification SMP may address the concept of “containment”(i.e., is the agent located within a bounded container, such as afenced-in yard) and may signal “yes” or “no”. Bring-about SMPs performactions in the environment to bring about a particular state. Forexample, a bring-about SMP may attempt to bring about containment (e.g.,if not already within a bound container, attempt to get into a boundedcontainer). If the bring-about SMP is able to bring about containment,the SMP may signal “yes”. SMPs may additionally or alternatively signaloutcomes in any number of ways and in any manner.

SMPs may additionally or alternatively be constrained in any manner; forexample, SMPs may terminate after a threshold number of processing stepsis achieved or after a threshold time has elapsed.

In one implementation of an invention embodiment, SMPs may signaloutcomes using trinary logic. In this implementation, a classificationSMP may, for instance, signal an outcome of “1” if a concept is found tobe true and a “−1” if a concept is found not to be true. If two outcomesare possible, why are three values needed? The reason: in some cases, itmay be desirable to maintain a vector that stores, for each SMP, arecord of the result returned by the SMP on last execution. In thesecases, it may be further desirable to initialize the SMPs at a valuethat does not correspond to one of the two concepts (e.g., “0”) so thatthe method 100 may effectively determine if a given SMP has beenexecuted to completion since initialization.

SMPs of the method 100 may additionally or alternatively signal outcomesin any manner. Further, the systems executing the method 100 maymaintain memory of SMP outcomes in any manner (or not at all).

S110 includes generating a sensorimotor training curriculum. S110functions to generate a set of environments where each environment isassociated with one or more concepts. These environments are then usedto train sensorimotor programs (e.g., to classify based on or bringabout the concepts). For each concept, S110 preferably generates aplurality of environments that represent the concept (additionally oralternatively, S110 may map concepts to environments in any manner).

S110 may generate the sensorimotor training curriculum in any manner. Inone implementation of an invention embodiment, S110 generates thesensorimotor training curriculum for a set of concepts automatically bya rejection sampler working in tandem with a general-purpose constraintsatisfaction problem (CSP) solver. Environment distributions may bespecified in a fragment of first-order logic, using a pre-definedvocabulary of unary and binary predicates that can be combined usingconjunction and negation. To generate environments, generators (e.g.,conjunctions of first-order logic expressions that specify randomsamples) may be sampled uniformly; then the generator itself is invoked.For classification concepts, the concept filter is then used to filtergenerated environments into those that satisfy a given concept and thosethat do not. Then, these filtered environments are assigned a rewardfunction. For example, for an environment with “Concept A” present, thereward function may reward +1 for SMPs that output a “1” signal(corresponding to concept present), a −1 for SMPs that output a “−1”signal (corresponding to concept not present), and 0 otherwise (e.g., ifan SMP times out). Likewise, for an environment with “Concept A” notpresent, the reward function may reward +1 for SMPs that output a “−1”signal (corresponding to concept not present), a −1 for SMPs that outputa “1” signal (corresponding to concept present), and 0 otherwise (e.g.,if an SMP times out). For bring-about concepts, the concept filterevaluation may be performed dynamically (e.g., at each step of SMPexecution, rewarding +1 if and only if the concept is true AND theconcept has signaled appropriately and 0 otherwise).

Note that in general, reward functions for SMPs may be implemented inany manner. For example, a bring-about concept SMP may receive a rewardif the concept is made true even if the SMP has not signaled correctly(e.g., at each step of SMP execution, rewarding +1 if the concept istrue but the SMP has not properly signaled, +2 if the concept is trueand the SMP has properly signaled, and 0 otherwise).

SMP training environments are preferably simulations of an environmentfor which utilization of sensorimotor programs are desired, but mayadditionally or alternatively be representative environments. Forexample, a set of SMPs intended to function only in virtual environmentsmay utilize simpler variations of these virtual environments or (ifpossible) actual representative environments. A set of SMPs intended tofunction in real-world environments may utilize virtual simulations ofthose real-world environments (e.g., a set of SMPs intended to operate arobot arm may be trained on simulated visual data and physics);additionally or alternatively, such SMPs may be trained in a real-worldenvironment (e.g., a physical environment that is reconfigured torepresent various concepts).

Data used for generating or simulating environments may include images,video, audio, speech, medical sensor data, natural language data,financial data, application data, physical data, traffic data,environmental data, etc.

S120 includes specifying a concept relationship. S120 functions toestablish relationships between concepts that can be exploited duringtraining (and may aid in increasing generalizability even aftertraining). As previously discussed, SMPs may be reused (i.e., SMPs maycall each other) during and after training.

From a training perspective, it may be more efficient for SMPs to havean existing hierarchy (e.g., based on complexity) that determines whatother SMPs a given SMP may call. Additionally or alternatively, ahierarchy may be used in specifying how SMPs are trained.

For example, as shown in FIGURE, a concept that classifies an SMP as“contained” in one dimension or not may call a first SMP that determineswhether an SMP is bounded in a first direction and a second SMP thatdetermines whether an SMP is bounded in the other direction—(if bothSMPs signal “1” then so does the “contained” SMP). From a trainingperspective, it may be preferable to train SMPs in reverse order of sucha hierarchy (e.g., train first SMPs that may not call other SMPs. Thentrain SMPs that may call those SMPs but no others, etc.). Alternativelystated, if the concept relationship for SMPs is top down in terms ofcomplexity (e.g., low-complexity SMPs may call no other SMPs;medium-complexity SMPs may call low-complexity SMPs, high-complexitySMPs may call low- and medium-complexity SMPs) it may be preferable fortraining to occur bottom up (e.g., train low-complexity SMPs, thenmedium-complexity, then high-complexity).

If a hierarchy or other concept relationship that limits the availableSMPs that an SMP may call exists, it may be based on complexity (assubjectively determined by a human) as in the previous example, but mayadditionally or alternatively be determined in any manner. However, theconcept relationship established in S120 may simply be a flatrelationship (e.g., there is no restriction on which SMPs an SMP maycall; e.g., all SMPs may call each other). Note also that the conceptrelationships used for training need not be the same as the conceptrelationships used in executing a fully trained sensorimotor network(e.g., by S150), and that concept relationships may change over time.Likewise, while a concept relationship may be useful for directingtraining as in the above example, training need not be performedaccording to the concept relationship (e.g., it may still be that SMPsmay only call less-complex SMPs during training, but instead of trainingthe less-complex SMPs first, then the more-complex SMPs, it may bedesirable to train all SMPs at the same time, or train more complex SMPsfirst).

Concept relationships may be specified manually, but they mayadditionally or alternatively be specified in any manner. For example,concept relationships may be determined automatically or partiallyautomatically by the results of training on similar networks. Likewise,a concept relationship initially determined may be updated during SMPtraining based on the results of SMP training.

Note that as used in this document, a statement that a first concept is“higher-order” than a second concept is to be interpreted as specifyingthat the first concept may call the second concept, but the secondconcept may not call the first.

S130 includes training a first sensorimotor program and S140 includestraining a second sensorimotor program using the first sensorimotorprogram. While S130 and S140 are substantially similar in function, theexplicit mention of S140 highlights that training of SMPs to call otherSMPs is an essential part of sensorimotor program training.

As previously mentioned, examples of SMPs include classification andbring-about SMPs. Classification SMPs are preferably trained usingreward functions that reward the SMP when the SMP terminates within aset time or number of steps and correctly returns a value consistentwith the presence or non-presence of a given concept in an environment.Bring-about SMPs are preferably trained using reward functions thatreward the SMP when the SMP successfully brings about a givenenvironmental state, terminates within a set time or number of steps,and correctly returns a value indicating that the SMP has successfullybrought about the given environmental state. SMPs may additionally oralternatively be rewarded in any manner. For example, bring-about SMPsmay receive shaping awards when an SMP successfully brings about aconcept but does not appropriately signal. The shaping reward may, forinstance, provide a smaller reward than the primary reward function.Note that bring-about SMPs may call classification SMPs and vice-versa.

SMPs may be trained in any manner. In one implementation of an inventionembodiment, sets of SMPs may be represented by a neural network (e.g., agated recurrent unit (GRU) network) and trained using natural policyoptimization (NPO). Networks may likewise be initialized in any manner,and training may occur over any number of iterations. For example, agiven SMP may be trained and evaluated for five different random seeds.When an SMP reuses another SMP, that other SMP may be selected in anymanner (e.g., the best performing seed may be selected, one of the seedsmay be selected at random, etc.).

SMPs preferably may call other SMPs as an option (e.g., reusing an SMPto perform some set of actions) or as observations (e.g., where the SMPmakes use of the output of SMPs it calls, not just the set of actionsperformed by it).

In addition to other SMPs, an SMP may perform any one of a set ofprimitive actions. For example, for an SMP controlling a robot arm,primitive actions may include simple movement of the robot arm (e.g.,move up, move down, rotate hand, etc.). Other examples of primitiveactions may be those related to the control of sensors, actuators, orprocessing modules; for example, the orientation of a camera, the focusof the camera, values recorded by touch sensors, etc. In general,primitive actions are preferably pure motor or sensory actions (ratherthan the concepts that are derived from motor and sensory interaction),but primitive actions may be any “base-level” action (i.e., an actionthat may not call an SMP but rather serves as building block for anSMP).

SMPs are trained to accomplish a concept. For example, a classificationconcept example is the “containment” classification (i.e., determine ifthe agent is contained or not) and a bring-about concept example isbringing-about containment (i.e., bring the agent to a contained stateif possible).

S150 includes executing the first and second sensorimotor programs. S150functions to allow the use of the sensorimotor programs trained inS110-S140 to accomplish a task or determine an environmental state. S150preferably includes executing the second SMP, and in the process ofexecuting the second SMP, executing the first SMP (e.g., reusing thefirst SMP as previously described in the method 100.

Note that if the first and second sensorimotor programs are trainedusing a simulation of a real world environment, execution may occurusing physical sensors and actuators (e.g., on a robot).

The methods of the preferred embodiment and variations thereof can beembodied and/or implemented at least in part as a machine configured toreceive a computer-readable medium storing computer-readableinstructions. The instructions are preferably executed bycomputer-executable components preferably integrated with a neuralnetwork. The computer-readable medium can be stored on any suitablecomputer-readable media such as RAMs, ROMs, flash memory, EEPROMs,optical devices (CD or DVD), hard drives, floppy drives, or any suitabledevice. The computer-executable component is preferably a general orapplication specific processor, but any suitable dedicated hardware orhardware/firmware combination device can alternatively or additionallyexecute the instructions.

As a person skilled in the art will recognize from the previous detaileddescription and from the figures and claims, modifications and changescan be made to the preferred embodiments of the invention withoutdeparting from the scope of this invention defined in the followingclaims.

We claim:
 1. A method for establishing sensorimotor programs, comprising: determining a first environment that represents a first concept; and training a first sensorimotor program to accomplish the first concept by interacting with the first environment, comprising training the first sensorimotor program using a first reward function that rewards the first sensorimotor program when the first sensorimotor program successfully accomplishes the first concept and correctly returns a value indicating that the first sensorimotor program has successfully accomplished the first concept.
 2. The method of claim 1, wherein the first concept is a bring-about concept; wherein the first reward function rewards when the first sensorimotor program successfully brings about the first concept and correctly returns a value indicating that the first sensorimotor program has successfully brought about the first concept.
 3. The method of claim 1, wherein the first concept is a classification concept; wherein the first reward function that rewards when the first sensorimotor program correctly returns a value consistent with the presence or non-presence of the first concept in the first environment.
 4. The method of claim 1, wherein training the first sensorimotor program to accomplish the first concept further comprises training the first sensorimotor program using a second reward function different from the first reward function that rewards when the first sensorimotor program successfully accomplishes the first concept but fails to return a value indicating that the first sensorimotor program has successfully accomplished the first concept.
 5. The method of claim 4, wherein the second reward function is a shaping reward function.
 6. The method of claim 1, wherein training the first sensorimotor program to accomplish the first concept comprises using a set of primitive actions to interact with the first environment.
 7. The method of claim 7, wherein using the set of primitive actions to interact with the first environment comprises pushing an object of the first environment.
 8. The method of claim 7, further comprising executing the trained first sensorimotor program on a robotic arm system comprising a robotic arm actuator, wherein an action of the set of primitive actions actuates the robotic arm actuator.
 9. A method for establishing sensorimotor programs, comprising: generating a first plurality of environments that represents a first concept, wherein the first concept is a bring-about concept; and training a first sensorimotor program to accomplish the first concept using a reward function in each environment of the first plurality, wherein the first sensorimotor program executes actions of a set of primitive actions to accomplish the first concept.
 10. The method of claim 9, wherein the first plurality of environments is generated based on recurring content that enables re-use of learned concepts.
 9. method of claim 9, wherein each environment of the first plurality of environments is associated with a dynamics model that collectively simulates the actions executed by the first sensorimotor program.
 12. The method of claim 9, wherein generating the first plurality of environments comprises generating a superset of environments and filtering the superset of environments to determine the first plurality of environments.
 13. The method of claim 9, further comprising: generating a second plurality of environments that represent a second concept higher-order than the first concept; and training, using the second plurality of environments, a second sensorimotor program to accomplish the second concept using the first sensorimotor program.
 14. The method of claim 13, wherein the second sensorimotor program is trained using the set of primitive actions, wherein the second sensorimotor program calls the first sensorimotor program as an additional action.
 15. The method of claim 13, wherein the second sensorimotor program calls the first sensorimotor program as an observation.
 16. The method of claim 9, wherein the first sensorimotor program is trained based on a set of actionable lower-level sensorimotor programs associated with different respective bring-about concepts and a set of conceptual lower-level sensorimotor programs associated with different respective classification concepts.
 17. The method of claim 16, wherein the set of actionable lower-level sensorimotor programs and the set of conceptual lower-level sensorimotor programs are trained before the first sensorimotor program is trained.
 18. The method of claim 16, wherein training the first sensorimotor program comprises automatically determining which programs of the actionable lower-level sensorimotor programs and the conceptual lower-level sensorimotor programs enable the first sensorimotor program to accomplish the first concept.
 19. The method of claim 9, wherein the first sensorimotor program is represented by a recurrent neural network.
 20. The method of claim 9, wherein the first sensorimotor program is trained using natural policy optimization. 